A Novel Algorithm for Finding Interspersed Repeat Regions
نویسندگان
چکیده
The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.
منابع مشابه
Identification of novel simple sequence length polymorphisms (SSLPs) in mouse by interspersed repetitive element (IRE)-PCR.
Interspersed repetitive element (IRE)-PCR is a useful method for identification of novel human or mouse sequence tagged sites (STSs) from contigs of genomic clones. We describe the use of IRE-PCR with mouse B1 repetitive element primers to generate novel, PCR amplifiable, simple sequence length polymorphisms (SSLPs) from yeast artificial chromosome (YAC) clones containing regions of mouse chrom...
متن کاملEfficient repeat finding via suffix arrays
We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the suffix tree based approaches for the repeat finding problem, being particularly well suited for very large inputs. We prove the corrrectness and complexity of th...
متن کاملH3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome.
In mammals, genome-wide chromatin maps and immunofluorescence studies show that broad domains of repressive histone modifications are present on pericentromeric and telomeric repeats and on the inactive X chromosome. However, only a few autosomal loci such as silent Hox gene clusters have been shown to lie in broad domains of repressive histone modifications. Here we present a ChIP-chip analysi...
متن کاملA Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملFunctional repeat-derived RNAs often originate from retrotransposon-propagated ncRNAs
The human genome is scattered with repetitive sequences, and the ENCODE project revealed that 60-70% of the genomic DNA is transcribed into RNA. As a consequence, the human transcriptome contains a large portion of repeat-derived RNAs (repRNAs). Here, we present a hypothesis for the evolution of novel functional repeat-derived RNAs from non-coding RNAs (ncRNAs) by retrotransposition. Upon ampli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2 شماره
صفحات -
تاریخ انتشار 2004